Enunciado


En el archivo sat.txt se tienen datos de valores de espectros de pixels en una imagen de satélite utilizados para la predicción de la clase de suelo. Realice un análisis de los datos utilizando diferentes métodos y compárelos mediante Validación Cruzada. Luego aplíquelos a la muestra de test sat.tst y analice los resultados obtenidos.

La base de datos contiene los valores multi-espectrales de píxeles en regiones de 3x3 en una imágen satelital, y la clasificación asociada con el píxel central de cada región. El objetivo es predecir esta clasificación dados los valores multi-espectrales.

Esta base de datos se generó a partir del escáner multi-espectral LANDSAT. Un marco de imágenes Landsat MSS consta de cuatro imágenes digitales de la misma escena en diferentes bandas espectrales. Dos de estos están en la región visible (correspondiente aproximadamente a las regiones verdes y rojas del espectro visible) y dos están en el infrarrojo. Cada píxel es una palabra binaria de 8 bits, con 0 correspondiente al negro y 255 a blanco. La resolución espacial de un píxel es de unos 80 m x 80m. Cada imagen contiene 2340 x 3380 de tales píxeles.

La base de datos es una subárea (pequeña) de una escena, que consta de 82 x 100 píxeles. Cada fila del set de datos se corresponde a una región de píxeles cuadrados 3x3 completamente contenida dentro de la subárea 82x100. Contiene los valoresen las cuatro bandas espectrales (convertidas en ASCII) para cada uno de los 9 píxeles en la región 3x3 y un número que indica la etiqueta de clasificación del píxel central.

La clase de cada píxel se codifica como un número, siendo estos:

En cada línea de datos, los cuatro valores espectrales para el píxel superior izquierdo se dan primero, seguidos por los cuatro valores espectrales para el píxel medio superior y luego los del píxel superior derecho, y así sucesivamente de modo que los píxeles se leen en secuencia de izquierda a derecha y de arriba a abajo. Por lo tanto, los cuatro valores espectrales para los píxeles centrales están dados por los atributos 17,18,19 y 20. Si lo desea, puede usar solo estos cuatro atributos, ignorando los demás.


Procedimiento



Obtención de los Datos



Dado que el enunciado menciona que pueden utilizarse solo los datos espectrales correspondientes al píxel central, buscamos extraerlos, junto con su clasificación, a una sola Data Frame.

file_trn = "SAT_trn.txt"
data_trn <- read.table(file_trn, sep = "", header=F)
sat_trn_df <- data_trn      %>% 
  dplyr::select(17:20,37)  %>% 
  rename(B1=V17,
         B2=V18,
         B3=V19,
         B4=V20,
         C=V37)
file_tst = "SAT_tst.txt"
data_tst <- read.table(file_tst, sep = "", header=F)
sat_tst_df <- data_tst      %>% 
  dplyr::select(17:20,37)  %>% 
  rename(B1=V17,
         B2=V18,
         B3=V19,
         B4=V20,
         C=V37)
sat_trn_df$C[sat_trn_df$C == 7] <- 6
sat_tst_df$C[sat_tst_df$C == 7] <- 6
sat_trn_df_numeric <- data.frame(sat_trn_df)
sat_tst_df_numeric <- data.frame(sat_tst_df)
class_names <- c("rojo", "algodón", "gris","gris húmedo", "vegetación", "gris muy húmedo")
class_values <- c(1,2,3,4,5,6)
class_map <- setNames(class_names, class_values)
sat_trn_df$C <- class_map[sat_trn_df$C]
sat_tst_df$C <- class_map[sat_tst_df$C]

Visualización de los Datos



B1 B2 B3 B4 C
92 112 118 85 gris
84 103 104 81 gris
84 99 104 78 gris
84 99 104 81 gris
76 99 104 81 gris
76 99 108 85 gris
80 112 118 88 gris
80 107 113 85 gris
76 91 104 74 gris húmedo
76 95 100 78 gris húmedo






Construcción de Modelos


Árbol de Decisión

set.seed(20)
n <- nrow(sat_trn_df)
# Entrenamiento del Modelo con 10-fold CV  
sat_rpart <-  rpart(C ~ B1 + B2 + B3 + B4, data= sat_trn_df, method = "class", cp=-1,xval=10)
sat_rpart_summary <- printcp(sat_rpart)
## 
## Classification tree:
## rpart(formula = C ~ B1 + B2 + B3 + B4, data = sat_trn_df, method = "class", 
##     cp = -1, xval = 10)
## 
## Variables actually used in tree construction:
## [1] B1 B2 B3 B4
## 
## Root node error: 3363/4435 = 0.75829
## 
## n= 4435 
## 
##             CP nsplit rel error  xerror      xstd
## 1   2.6167e-01      0   1.00000 1.00000 0.0084779
## 2   2.5335e-01      1   0.73833 0.76896 0.0097636
## 3   1.2905e-01      2   0.48498 0.48498 0.0095487
## 4   6.3634e-02      3   0.35593 0.35742 0.0088020
## 5   1.7544e-02      4   0.29230 0.29379 0.0082400
## 6   1.4570e-02      5   0.27475 0.28189 0.0081181
## 7   6.6905e-03      7   0.24561 0.24234 0.0076694
## 8   5.9471e-03     10   0.22034 0.23253 0.0075467
## 9   5.6497e-03     11   0.21439 0.22629 0.0074660
## 10  4.7577e-03     13   0.20309 0.21885 0.0073673
## 11  4.6090e-03     14   0.19833 0.20934 0.0072364
## 12  3.2709e-03     16   0.18912 0.20636 0.0071945
## 13  1.6354e-03     17   0.18585 0.19774 0.0070698
## 14  1.3381e-03     19   0.18258 0.19625 0.0070479
## 15  1.1894e-03     21   0.17990 0.19536 0.0070346
## 16  1.0903e-03     22   0.17871 0.19477 0.0070258
## 17  8.9206e-04     25   0.17544 0.19715 0.0070610
## 18  6.9382e-04     31   0.17009 0.19715 0.0070610
## 19  5.9471e-04     34   0.16800 0.19715 0.0070610
## 20  4.9559e-04     48   0.15968 0.19685 0.0070567
## 21  4.4603e-04     51   0.15819 0.19387 0.0070124
## 22  3.9647e-04     55   0.15641 0.19328 0.0070035
## 23  2.9735e-04     58   0.15522 0.19328 0.0070035
## 24  1.4868e-04     61   0.15433 0.20012 0.0071047
## 25  8.4958e-05     63   0.15403 0.20101 0.0071176
## 26  7.4338e-05     70   0.15343 0.20071 0.0071133
## 27  0.0000e+00     74   0.15314 0.19952 0.0070960
## 28 -1.0000e+00    197   0.15314 0.19952 0.0070960
plotcp(sat_rpart) 

t_nodes <- which.min(sat_rpart$cptable[,4])
cp<-sat_rpart$cptable[which.min(sat_rpart$cptable[,"xerror"]),"CP"]
CVErr_rpart = sat_rpart$frame[1,'dev'] / n * sat_rpart$cptable[which.min(sat_rpart$cptable[,"xerror"]),"xerror"]
poda_rpart<-prune(sat_rpart,cp=cp)
rpart.plot(poda_rpart, cex = 0.5, extra = 0)

Random Forest

set.seed(20)
sat_rf<- randomForest(C~., data= sat_trn_df, 
                     mtry=sqrt(ncol(sat_trn_df)-1), importance=TRUE)

importance(sat_rf)
##            1        2         3        4         5        6
## B1 227.87903 29.96135 362.16076 58.21353  50.21856 48.37936
## B2 140.86987 43.00905  53.51638 37.62347 112.03571 41.93234
## B3  30.04423 10.40657  24.77468 15.11574  28.56435 32.31841
## B4  57.76868 24.80617  34.26451 35.82601  52.16778 74.12139
##    MeanDecreaseAccuracy MeanDecreaseGini
## B1            186.14425        1186.7218
## B2            119.11363         914.1809
## B3             48.88156         425.2869
## B4             91.94379         853.1770
varImpPlot(sat_rf)

CVErr_rf <- unname(sat_rf$err.rate[nrow(sat_rf$err.rate),1])
CVErr_rf
## [1] 0.1492672

Vecinos Más Cercanos

set.seed(20)
sat_knn <- train(C ~ ., data = sat_trn_df, 
                   method = "knn",
                   preProcess = c("center","scale"),
                   trControl =  trainControl(method = "cv", number = 10),
                   tuneGrid = expand.grid(k = 1:100),
                   metric = "Accuracy")
results <- sat_knn$results
ggplot(results, aes(x = k, y = Accuracy)) + 
  geom_line(color = "steelblue", size = 1.2) + 
  xlab("Valor de K") + 
  ylab("Exactitud") + 
  ggtitle("Rendimiento para diferentes valores de K") +
  theme_bw() + 
  theme(plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
        axis.title.x = element_text(face = "bold", size = 12),
        axis.title.y = element_text(face = "bold", size = 12),
        axis.text = element_text(size = 10),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank())

best_k <- results$k[which.max(results$Accuracy)]
CVErr_KNN <- 1-unname(sat_knn$results[best_k,"Accuracy"])

Regresión Logística Multinomial

summary(sat_mlr)
## Call:
## nnet::multinom(formula = .outcome ~ ., data = dat, decay = param$decay)
## 
## Coefficients:
##   (Intercept)        B1          B2          B3         B4
## 2   -9.921154 0.5867093 -0.63317495  0.29943995 -0.1122932
## 3  -36.648978 0.7617752 -0.02673514  0.02795903 -0.2534971
## 4  -10.253548 0.5886442 -0.07999703 -0.02004337 -0.2854203
## 5    7.594031 0.4558313 -0.52173889  0.21832973 -0.2099831
## 6    4.599893 0.5678520 -0.18940106 -0.05394639 -0.2887664
## 
## Std. Errors:
##   (Intercept)         B1         B2         B3         B4
## 2   1.6977543 0.05040409 0.04477873 0.05284647 0.04508065
## 3   1.2832527 0.03391253 0.03743092 0.04631104 0.04434272
## 4   0.8182467 0.03437754 0.03467115 0.04409015 0.04040293
## 5   0.9108139 0.03161938 0.03489501 0.04077884 0.03601165
## 6   0.7841282 0.03491517 0.03359844 0.04276092 0.03778148
## 
## Residual Deviance: 3755.425 
## AIC: 3805.425
predicted_mlr <- predict(sat_mlr, newdata = sat_trn_df)
CVErr_mlr <- mean(predicted_mlr != sat_trn_df$C)

Análisis Discriminante Lineal

set.seed(20)
sat_LDA <- stepclass(C ~ B1 + B2 + B3 + B4, data= sat_trn_df, method="lda", criterion = "CR", direction = "both", improvement <- 0.01, fold = 10)
##  `stepwise classification', using 10-fold cross-validated correctness rate of method lda'.
## 4435 observations of 4 variables in 6 classes; direction: both
## stop criterion: improvement less than 1%.
## correctness rate: 0.56054;  in: "B2";  variables (1): B2 
## correctness rate: 0.79776;  in: "B1";  variables (2): B2, B1 
## correctness rate: 0.82616;  in: "B3";  variables (3): B2, B1, B3 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.28
sat_LDA
## method      : lda 
## final model : C ~ B1 + B2 + B3
## <environment: 0x0000021a9d2790b8>
## 
## correctness rate = 0.8262
CVErr_LDA <- 1-unname(sat_LDA$result.pm["crossval.rate"])

Análisis Discriminante Cuadrático

set.seed(20)
sat_QDA <- stepclass(C ~ B1 + B2 + B3 + B4, data= sat_trn_df, method="qda", criterion = "CR", direction = "both", improvement <- 0.01, fold = 10)
##  `stepwise classification', using 10-fold cross-validated correctness rate of method qda'.
## 4435 observations of 4 variables in 6 classes; direction: both
## stop criterion: improvement less than 1%.
## correctness rate: 0.58107;  in: "B4";  variables (1): B4 
## correctness rate: 0.80496;  in: "B1";  variables (2): B4, B1 
## correctness rate: 0.84849;  in: "B2";  variables (3): B4, B1, B2 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00        2.44
sat_QDA
## method      : qda 
## final model : C ~ B1 + B2 + B4
## <environment: 0x0000021aadc04130>
## 
## correctness rate = 0.8485
CVErr_QDA <- 1-unname(sat_QDA$result.pm["crossval.rate"])

Comparación del Error de 10-fold CV

models <- c("rpart", "RF", "KNN", "RLM","LDA","QDA")
CVErrs <- c(CVErr_rpart,CVErr_rf,CVErr_KNN,CVErr_mlr,CVErr_LDA,CVErr_QDA)

CVErr_df <- data.frame(Label = models, Value = CVErrs)

CVErr_plot <- ggplot(CVErr_df, aes(x = Label, y = Value)) + 
  geom_bar(stat = "identity", color = "black", fill = "red") + labs(x = "Modelos", y = "Error de Validación Cruzada (K=10)", title = "Comparación de Precisión de Modelos") +
  theme(plot.title = element_text(hjust = 0.5))

CVErr_plot


Evaluación de los Modelos contra el Set de Entrenamiento


Árbol de Decisión

Random Forest

Regresión Logística Multinomial

Vecinos Más Cercanos

Análisis Discriminante Lineal

Análisis Discriminante Cuadrático

Comparación del Error de Clasificación de los Modelos

models <- c("rpart", "RF", "KNN", "RLM","LDA","QDA")
CVErrs <- c(CLErr_rpart,CLErr_rf,CLErr_knn,CLErr_mlr,CLErr_LDA,CLErr_QDA)

CVErr_df <- data.frame(Label = models, Value = CVErrs)

CErr_plot <-  ggplot(CVErr_df, aes(x = Label, y = Value)) + 
  geom_bar(stat = "identity", color = "black", fill = "red") + labs(x = "Modelos", y = "Error de Clasificación", title = "Comparación de Precisión de Modelos") +
  theme(plot.title = element_text(hjust = 0.5))

CErr_plot


Análisis Para el Set de Datos Completo


Árbol de Decisión

## 
## Classification tree:
## rpart(formula = C ~ ., data = sat_trn_full_df, method = "class", 
##     cp = -1, xval = 10)
## 
## Variables actually used in tree construction:
##  [1] V1  V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V2  V20 V21 V22 V23 V24 V25 V26
## [20] V27 V28 V29 V3  V30 V31 V32 V33 V34 V35 V36 V4  V5  V6  V7  V8  V9 
## 
## Root node error: 3363/4435 = 0.75829
## 
## n= 4435 
## 
##             CP nsplit rel error  xerror      xstd
## 1   0.26167113      0   1.00000 1.00000 0.0084779
## 2   0.25334523      1   0.73833 0.76896 0.0097636
## 3   0.12905144      2   0.48498 0.48498 0.0095487
## 4   0.06363366      3   0.35593 0.35742 0.0088020
## 5   0.01754386      4   0.29230 0.29497 0.0082518
## 6   0.01486768      5   0.27475 0.28338 0.0081337
## 7   0.01159679      6   0.25989 0.25453 0.0078153
## 8   0.00683913      7   0.24829 0.24591 0.0077128
## 9   0.00669045      9   0.23461 0.23877 0.0076253
## 10  0.00624442     13   0.20458 0.23699 0.0076030
## 11  0.00490633     15   0.19209 0.21588 0.0073269
## 12  0.00356824     17   0.18228 0.21053 0.0072531
## 13  0.00297354     18   0.17871 0.20607 0.0071903
## 14  0.00267618     19   0.17574 0.20517 0.0071776
## 15  0.00237883     21   0.17038 0.20369 0.0071563
## 16  0.00223015     22   0.16800 0.19923 0.0070916
## 17  0.00208147     24   0.16354 0.19923 0.0070916
## 18  0.00178412     26   0.15938 0.19744 0.0070654
## 19  0.00168500     27   0.15760 0.19685 0.0070567
## 20  0.00163544     30   0.15254 0.19685 0.0070567
## 21  0.00148677     32   0.14927 0.19506 0.0070302
## 22  0.00133809     36   0.14332 0.19387 0.0070124
## 23  0.00118941     40   0.13797 0.19447 0.0070213
## 24  0.00104074     50   0.12608 0.19387 0.0070124
## 25  0.00089206     55   0.12073 0.19120 0.0069721
## 26  0.00077312     57   0.11894 0.19060 0.0069631
## 27  0.00074338     62   0.11508 0.19090 0.0069676
## 28  0.00059471     64   0.11359 0.19090 0.0069676
## 29  0.00049559     66   0.11240 0.19179 0.0069811
## 30  0.00029735     69   0.11091 0.19269 0.0069946
## 31  0.00019824     74   0.10943 0.19804 0.0070742
## 32  0.00014868     77   0.10883 0.19893 0.0070873
## 33  0.00000000     83   0.10794 0.19982 0.0071003
## 34 -1.00000000    157   0.10794 0.19982 0.0071003

## 26 
## 26

Random Forest

##             1         2         3         4         5         6
## V1  14.347479  6.523180 12.785265  8.878950  9.215792 13.385304
## V2  25.080362  8.155885 10.329826 16.965498 13.801273 14.409123
## V3   7.750380  6.264034 10.674574 11.903256  8.868406  8.914603
## V4  11.671033  9.196722 16.190272 10.119210  9.599166 15.632320
## V5  20.327046  7.747617 14.732451  8.738162 12.260005 12.968500
## V6  22.024864  9.759857  8.967969 12.434673 15.591975 14.986900
## V7   8.193229  6.455881 11.596530 10.544200  7.867005  7.465667
## V8  16.440057  8.843152 10.938300  9.559178  9.546703 12.694114
## V9  27.245723  7.757990 11.797470  7.866552  9.557962 13.395249
## V10 19.668841 10.092615  7.702224  9.208108 14.255990 14.580319
## V11  8.183801  7.410958 12.179594 13.334420  5.674602 10.092250
## V12 13.869129  9.431361 12.276045 11.954095 10.082696 12.798514
## V13 20.251935  8.567868 16.168249  8.849309 11.890796 13.474792
## V14 22.215691 10.724722 10.223932 15.692282 15.305327 13.701134
## V15  7.862138  7.007398  9.906889 11.459271  7.941938 11.304583
## V16 16.505530 11.504687 13.949191  9.006578 11.792044 18.082491
## V17 25.253907 13.485706 19.712859 15.525689 20.078888 13.963230
## V18 21.396409 14.118719 14.385012 24.411052 24.468676 17.207413
## V19  8.560900  8.542848 10.437008 13.847183 10.239672 11.134022
## V20 19.027569 15.200911 11.888635 13.264401 11.666925 17.628338
## V21 29.431963  8.872517 17.675086 14.599193 15.461604 15.775755
## V22 20.051290 10.537004 11.771828 18.662235 19.376440 21.337874
## V23  9.749778  6.501513 12.096690 17.297252  8.046496  8.803671
## V24 15.582407  9.479060 14.049516 15.247126  9.821586 14.837700
## V25 23.697043  7.937527 16.986925  6.526053 10.340832 13.092715
## V26 23.867149  7.255891  9.029530 17.755496 10.805070 10.428707
## V27  7.894474  7.505076 12.865851  9.743226  8.409628  9.225300
## V28 14.727386 10.452824 13.915661  7.308338  9.939301 15.567830
## V29 22.313159  8.546988 15.018766 10.048277 10.401912 12.934963
## V30 17.037511  8.587731  8.721241 11.704029 13.527364 13.019518
## V31  8.545563  6.403730  8.403834 10.781731  9.028152  8.305877
## V32 16.176983  8.932304  8.691565 10.082833 11.848480 14.326251
## V33 24.911565  7.287095 15.507143  5.748705 10.478895 13.887847
## V34 18.162001  8.590430  9.937206  9.282485 16.150622 17.065096
## V35  8.769205  6.280038 12.924365 11.616752  8.099357 10.025106
## V36 16.279684  6.403133 13.129658  9.659609 11.145730 14.219610
##     MeanDecreaseAccuracy MeanDecreaseGini
## V1              19.84657         62.23829
## V2              32.28670         89.32196
## V3              18.44815         39.12481
## V4              20.91121         66.59387
## V5              22.37433        123.11917
## V6              31.22470        109.57090
## V7              15.57568         35.83360
## V8              21.46521         83.61104
## V9              27.50258        106.41434
## V10             29.38192         78.43511
## V11             18.88206         36.86583
## V12             23.55204         62.45378
## V13             21.89113        143.09708
## V14             30.72190        129.29556
## V15             18.39168         52.08460
## V16             25.45299        145.49467
## V17             27.32588        242.93899
## V18             33.44038        224.31144
## V19             16.77841         75.62461
## V20             23.97588        179.15180
## V21             30.97529        193.19072
## V22             33.09108        174.07645
## V23             21.39281         51.19399
## V24             23.28907        108.81224
## V25             26.31587        105.59507
## V26             30.82425         68.20054
## V27             20.38909         39.62732
## V28             21.72995         93.16480
## V29             24.21030        128.52837
## V30             27.27160         92.26471
## V31             17.24529         34.76598
## V32             20.34748         99.84895
## V33             29.07778        101.76329
## V34             29.70615        105.09069
## V35             21.32641         39.08788
## V36             24.29074         63.03664

Vecinos Más Cercanos

Regresión Logística Multinomial

## Call:
## nnet::multinom(formula = .outcome ~ ., data = dat, decay = param$decay)
## 
## Coefficients:
##   (Intercept)          V1          V2           V3           V4         V5
## 2  -10.041277 0.011633320 -0.08866011  0.048736082 -0.007836211 0.20621349
## 3  -20.378721 0.011140595 -0.06815353  0.027252249  0.039038401 0.12890553
## 4  -10.005767 0.037706249 -0.02832780  0.005182813  0.030101942 0.08771919
## 5    4.310293 0.015662639 -0.05979296  0.025596502  0.028523005 0.13158697
## 6    2.981347 0.004889374 -0.06399075 -0.020918896  0.065396066 0.15362516
##            V6         V7          V8          V9         V10         V11
## 2 -0.07046031 0.05849743 -0.09916763 -0.23035971  0.05487740 -0.01018132
## 3 -0.01023674 0.02334025 -0.10407334  0.01094643 -0.07388672  0.10356648
## 4 -0.04171813 0.07508490 -0.08062009 -0.03705449 -0.05554305  0.05291187
## 5 -0.03302893 0.02956949 -0.14203745 -0.04364464 -0.07723677  0.10014638
## 6  0.01751298 0.01713530 -0.13431599 -0.03026042 -0.04818043  0.04464404
##            V12        V13         V14          V15        V16          V17
## 2  0.047024992 0.11602469 -0.06530551  0.012422682 0.09757902  0.196116606
## 3 -0.053235834 0.12028057 -0.05625099 -0.013169008 0.06185362  0.021216840
## 4 -0.029957741 0.07085136 -0.05742333  0.002926769 0.03030945  0.042819710
## 5 -0.013539372 0.03250508 -0.04543212  0.028452811 0.05963729 -0.009482224
## 6  0.003462489 0.03033801 -0.04426977  0.005334445 0.04055678 -0.009438855
##            V18        V19        V20          V21        V22          V23
## 2 -0.170050812 0.08470339 -0.2117381 -0.025080470 0.09688088 -0.028190776
## 3 -0.024565151 0.09392422 -0.1519914  0.043435687 0.02603040 -0.035636777
## 4 -0.052699586 0.06156555 -0.1323629  0.094373655 0.03255762 -0.007513793
## 5 -0.088499090 0.10572731 -0.2332036  0.086437722 0.01248936 -0.032978130
## 6 -0.002384114 0.07188680 -0.1813734  0.009126433 0.02387681 -0.025139193
##          V24          V25         V26       V27         V28         V29
## 2 0.14146854 -0.104255928  0.01286808 0.1156963 -0.07636536 0.081602559
## 3 0.11258387  0.005678118 -0.06977875 0.1568363 -0.09402118 0.053380932
## 4 0.03780018 -0.085349494 -0.02746782 0.1701236 -0.13727885 0.035347957
## 5 0.08979900 -0.044821516 -0.03564611 0.1139013 -0.11675376 0.007046051
## 6 0.07970401  0.010761963 -0.04017290 0.0994432 -0.12406979 0.055388797
##            V30        V31          V32        V33         V34         V35
## 2 -0.111281819 0.04687024 -0.035408254 0.11878873 -0.07399343  0.01072422
## 3 -0.037891231 0.01333758 -0.043551988 0.08918813 -0.02290963 -0.01924823
## 4 -0.061578044 0.09632598 -0.073387776 0.12374137 -0.08545925  0.02117023
## 5 -0.006815922 0.03950480 -0.006685306 0.09955379 -0.09604781  0.02921400
## 6 -0.067981417 0.03953664 -0.012509769 0.07101754 -0.01432718  0.01335151
##             V36
## 2 -0.0410039083
## 3 -0.0007439179
## 4 -0.0184314429
## 5 -0.0439894601
## 6 -0.0664708960
## 
## Std. Errors:
##   (Intercept)         V1         V2         V3         V4         V5         V6
## 2  0.01236324 0.04366473 0.03165150 0.03499723 0.03335968 0.05167846 0.03677143
## 3  0.34938369 0.03234255 0.02519188 0.02874874 0.02940229 0.03930459 0.02912072
## 4  0.47599482 0.03283715 0.02525149 0.02873599 0.02899265 0.04009069 0.02926500
## 5  0.21208527 0.03591308 0.02666621 0.02943234 0.02900005 0.04281700 0.03056582
## 6  0.46297310 0.03106560 0.02392134 0.02651756 0.02719780 0.03737165 0.02767196
##           V7         V8         V9        V10        V11        V12        V13
## 2 0.03733509 0.03632063 0.04617490 0.03001791 0.03639816 0.03208830 0.04764122
## 3 0.02961524 0.03253631 0.03356620 0.02303724 0.02866407 0.02740302 0.03573847
## 4 0.02978146 0.03212779 0.03435290 0.02316587 0.02881193 0.02726241 0.03650879
## 5 0.03103002 0.03225148 0.03707527 0.02425580 0.02974641 0.02713716 0.03961988
## 6 0.02748290 0.03039313 0.03195555 0.02175804 0.02653014 0.02534765 0.03453444
##          V14        V15        V16        V17        V18        V19        V20
## 2 0.03403094 0.03590879 0.03675849 0.05359208 0.03706961 0.03792888 0.03849453
## 3 0.02676861 0.02912685 0.03248246 0.04044278 0.02918168 0.02968905 0.03469929
## 4 0.02668878 0.02931989 0.03221597 0.04089251 0.02916575 0.03003330 0.03408425
## 5 0.02835822 0.03012988 0.03222492 0.04407052 0.03128037 0.03112775 0.03408286
## 6 0.02570572 0.02700195 0.03036498 0.03850554 0.02797236 0.02750367 0.03269578
##          V21        V22        V23        V24        V25        V26        V27
## 2 0.04834975 0.03228657 0.03700916 0.03470027 0.04309672 0.02927003 0.03513068
## 3 0.03655183 0.02639977 0.02891477 0.03103616 0.03285555 0.02365707 0.02859243
## 4 0.03738730 0.02649046 0.02922960 0.03027532 0.03350740 0.02355047 0.02886155
## 5 0.04043290 0.02791366 0.03021617 0.03024462 0.03589307 0.02509726 0.02983479
## 6 0.03477958 0.02531165 0.02676384 0.02889859 0.03143473 0.02279398 0.02667284
##          V28        V29        V30        V31        V32        V33        V34
## 2 0.03391676 0.05116816 0.03648290 0.03758016 0.03769108 0.04256800 0.02926133
## 3 0.02951688 0.03896867 0.02922321 0.03015095 0.03424834 0.03124130 0.02349621
## 4 0.02929196 0.03953727 0.02916499 0.03031922 0.03362662 0.03277022 0.02353478
## 5 0.02966845 0.04199949 0.03090811 0.03129439 0.03337880 0.03614112 0.02492615
## 6 0.02750901 0.03670997 0.02787971 0.02790729 0.03212398 0.03059734 0.02249722
##          V35        V36
## 2 0.03600778 0.03289523
## 3 0.02876995 0.02939106
## 4 0.02880899 0.02859065
## 5 0.02978972 0.02839475
## 6 0.02666640 0.02697115
## 
## Residual Deviance: 4740.439 
## AIC: 5110.439

Análisis Discriminante Lineal

## correctness rate: 0.56054;  in: "V18";  variables (1): V18 
## correctness rate: 0.79776;  in: "V17";  variables (2): V18, V17 
## correctness rate: 0.82797;  in: "V15";  variables (3): V18, V17, V15 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00       21.95
## method      : lda 
## final model : C ~ V15 + V17 + V18
## <environment: 0x0000021ab66af5c0>
## 
## correctness rate = 0.828

Análisis Discriminante Cuadrático

## correctness rate: 0.58107;  in: "V20";  variables (1): V20 
## correctness rate: 0.80496;  in: "V17";  variables (2): V20, V17 
## correctness rate: 0.84849;  in: "V18";  variables (3): V20, V17, V18 
## 
##  hr.elapsed min.elapsed sec.elapsed 
##        0.00        0.00       22.47
## method      : qda 
## final model : C ~ V17 + V18 + V20
## <environment: 0x0000021aa897c590>
## 
## correctness rate = 0.8485

Comparación del Error de 10-fold CV


Evaluación de los Modelos contra el Set de Entrenamiento


Árbol de Decisión

Random Forest

Regresión Logística Multinomial

Vecinos Más Cercanos

Análisis Discriminante Lineal

Análisis Discriminante Cuadrático

Comparación del Error de Clasificación de los Modelos

models <- c("rpart", "RF", "KNN", "RLM","LDA","QDA")
CVErrs <- c(CLErr_rpart,CLErr_rf,CLErr_knn,CLErr_mlr,CLErr_LDA,CLErr_QDA)

CVErr_df <- data.frame(Label = models, Value = CVErrs)

CErr_plot_full <- ggplot(CVErr_df, aes(x = Label, y = Value)) + 
  geom_bar(stat = "identity", color = "black", fill = "red") + labs(x = "Modelos", y = "Error de Clasificación", title = "Comparación de Precisión de Modelos") +
  theme(plot.title = element_text(hjust = 0.5))

CErr_plot_full


Observaciones y Conclusiones Generales


  • Comparando el Error de Clasificación de los modelos considerando unicamente el pixel central y considerando todo el set de datos, concluímos que la hipótesis del enunciado no es válida, ya que se obtiene una reducción considerable en el error de clasificación al considerar todos los datos, así como una mayor variabilidad entre modelos.
CVErr_plot$labels$title <- "Error de CV (píxel central)"
CErr_plot$labels$title <-  "Error de Clasificación (píxel central)"
CVErr_plot_full$labels$title <- "Error de CV (completo)"
CErr_plot_full$labels$title <- "Error de Clasificación (completo)"

grid.arrange(CVErr_plot,
            CErr_plot,
            CVErr_plot_full,
            CErr_plot_full, 
            ncol = 2)

  • Considerando todos los datos, el mejor resultado se obtiene para Random Forest con un error de clasificación de aproximadamente el 8%, seguido de KNN con un error de clasificación de aproximadamente el 9%.

< COMPLETAR >